In [1]:

    
import graphlab as gl
gl.canvas.set_target("ipynb")



In [2]:

    
implicit = gl.SFrame('implicit')
explicit = gl.SFrame('explicit')
items = gl.SFrame('items')
ratings = gl.SFrame('ratings')









    



[INFO] This commercial license of GraphLab Create is assigned to engr@turi.com.

[INFO] Start server at: ipc:///tmp/graphlab_server-41454 - Server binary: /Users/chris/miniconda/lib/python2.7/site-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1443481858.log
[INFO] GraphLab Server Version: 1.6.1



In [3]:

    
ratings.show()

Split the data into a training set and a test set

This allows us to evaluate generalization ability.



In [4]:

    
train, valid = gl.recommender.util.random_split_by_user(implicit)

Feature engineering

Compute the number of times each item has been rated.



In [5]:

    
num_ratings_per_item = train.groupby('item_id', {'num_users': gl.aggregate.COUNT})
items = items.join(num_ratings_per_item, on='item_id')

Transform the count into a categorical variable using the feature_engineering module.



In [6]:

    
binner = gl.feature_engineering.FeatureBinner(features=['num_users'], strategy='logarithmic', num_bins=5)
items = binner.fit_transform(items)

Convert each genre element into a dictionary and each year to an integer.



In [7]:

    
items['genres'] = items['genres'].apply(lambda x: {k:1 for k in x})
items['year'] = items['year'].astype(int)



In [8]:

    
items









    Out[8]:





    
        item_id
        genres
        title
        year
        num_users
    
    
        1
        {"Children's": 1,
'Comedy': 1, 'Animati ...
        Toy Story
        1995
        (1000.000, Inf]
    
    
        2
        {"Children's": 1,
'Adventure': 1, ...
        Jumanji
        1995
        (100.000, 1000.000]
    
    
        3
        {'Romance': 1, 'Comedy':
1} ...
        Grumpier Old Men
        1995
        (100.000, 1000.000]
    
    
        4
        {'Drama': 1, 'Comedy': 1}
        Waiting to Exhale
        1995
        (10.000, 100.000]
    
    
        5
        {'Comedy': 1}
        Father of the Bride Part
II ...
        1995
        (10.000, 100.000]
    
    
        6
        {'Action': 1, 'Thriller':
1, 'Crime': 1} ...
        Heat
        1995
        (100.000, 1000.000]
    
    
        7
        {'Romance': 1, 'Comedy':
1} ...
        Sabrina
        1995
        (100.000, 1000.000]
    
    
        8
        {"Children's": 1,
'Adventure': 1} ...
        Tom and Huck
        1995
        (10.000, 100.000]
    
    
        9
        {'Action': 1}
        Sudden Death
        1995
        (10.000, 100.000]
    
    
        10
        {'Action': 1,
'Adventure': 1, ...
        GoldenEye
        1995
        (100.000, 1000.000]
    

[3526 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Train models

Collaborative filtering approach that uses the Jaccard similarity of two users' item lists



In [9]:

    
m0 = gl.item_similarity_recommender.create(train)









    



PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Column 'score' ignored.
PROGRESS:     To use this column as the target, set target = "score" and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Data has 555786 observations with 6038 users and 3526 items.
PROGRESS:     Data prepared in: 0.498567s
PROGRESS: Computing item similarity statistics:
PROGRESS: Computing most similar items for 3526 items:
PROGRESS: +-----------------+-----------------+
PROGRESS: | Number of items | Elapsed Time    |
PROGRESS: +-----------------+-----------------+
PROGRESS: | 1000            | 1.64295         |
PROGRESS: | 2000            | 1.71588         |
PROGRESS: | 3000            | 1.82574         |
PROGRESS: +-----------------+-----------------+
PROGRESS: Finished training in 2.05003s

Collaborative filtering approach that learns latent factors for each user and each item



In [10]:

    
m1 = gl.ranking_factorization_recommender.create(train, max_iterations=10)









    



PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS:     Data has 555786 observations with 6038 users and 3526 items.
PROGRESS:     Data prepared in: 0.755589s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter                      | Description                                      | Value    |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |
PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |
PROGRESS: | solver                         | Solver used for training                         | adagrad  |
PROGRESS: | linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
PROGRESS: | binary_target                  | Assume Binary Targets                            | True     |
PROGRESS: | max_iterations                 | Maximum Number of Iterations                     | 10       |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS:   Optimizing model using SGD; tuning step size.
PROGRESS:   Using 69473 / 555786 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value                |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0       | 16.6667           | Not Viable                               |
PROGRESS: | 1       | 4.16667           | Not Viable                               |
PROGRESS: | 2       | 1.04167           | Not Viable                               |
PROGRESS: | 3       | 0.260417          | Not Viable                               |
PROGRESS: | 4       | 0.0651042         | No Decrease (1.51234 >= 1.38646)         |
PROGRESS: | 5       | 0.016276          | 1.34978                                  |
PROGRESS: | 6       | 0.00813802        | 1.35679                                  |
PROGRESS: | 7       | 0.00406901        | 1.36648                                  |
PROGRESS: | 8       | 0.00203451        | 1.37248                                  |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final   | 0.016276          | 1.34978                                  |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter.   | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size   |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 423us        | 1.38646           | 0.69317                           |             |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1       | 1.70s        | 1.33478           | 0.649814                          | 0.016276    |
PROGRESS: | 2       | 3.19s        | 1.30837           | 0.643913                          | 0.016276    |
PROGRESS: | 3       | 4.46s        | 1.29902           | 0.643703                          | 0.016276    |
PROGRESS: | 4       | 5.70s        | 1.29397           | 0.643084                          | 0.016276    |
PROGRESS: | 5       | 6.98s        | 1.28905           | 0.642419                          | 0.016276    |
PROGRESS: | 6       | 8.32s        | 1.2865            | 0.641463                          | 0.016276    |
PROGRESS: | 7       | 9.90s        | 1.28333           | 0.64064                           | 0.016276    |
PROGRESS: | 8       | 11.75s       | 1.28209           | 0.640611                          | 0.016276    |
PROGRESS: | 9       | 13.07s       | 1.28062           | 0.64002                           | 0.016276    |
PROGRESS: | 10      | 14.36s       | 1.27902           | 0.639188                          | 0.016276    |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached.
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS:        Final objective value: 1.28501
PROGRESS:        Final training Predictive Error: 0.636185

Collaborative filtering approach that learns latent factors for users, items, and side data



In [11]:

    
m2 = gl.ranking_factorization_recommender.create(train, 
                                                 item_data=items[['item_id', 'year']], 
                                                 max_iterations=10)









    



PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS:     Data has 555786 observations with 6038 users and 3526 items.
PROGRESS:     Data prepared in: 0.764826s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter                      | Description                                      | Value    |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |
PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |
PROGRESS: | solver                         | Solver used for training                         | adagrad  |
PROGRESS: | linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
PROGRESS: | binary_target                  | Assume Binary Targets                            | True     |
PROGRESS: | side_data_factorization        | Assign Factors for Side Data                     | True     |
PROGRESS: | max_iterations                 | Maximum Number of Iterations                     | 10       |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS:   Optimizing model using SGD; tuning step size.
PROGRESS:   Using 69473 / 555786 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value                |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0       | 12.5              | Not Viable                               |
PROGRESS: | 1       | 3.125             | Not Viable                               |
PROGRESS: | 2       | 0.78125           | Not Viable                               |
PROGRESS: | 3       | 0.195312          | Not Viable                               |
PROGRESS: | 4       | 0.0488281         | No Decrease (2.55 >= 1.38646)            |
PROGRESS: | 5       | 0.012207          | No Decrease (1.48073 >= 1.38646)         |
PROGRESS: | 6       | 0.00305176        | No Decrease (1.40179 >= 1.38646)         |
PROGRESS: | 7       | 0.000762939       | No Decrease (1.40125 >= 1.38646)         |
PROGRESS: | 8       | 0.000190735       | 1.38602                                  |
PROGRESS: | 9       | 9.53674e-05       | 1.38607                                  |
PROGRESS: | 10      | 4.76837e-05       | 1.38616                                  |
PROGRESS: | 11      | 2.38419e-05       | 1.38622                                  |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final   | 0.000190735       | 1.38602                                  |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter.   | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size   |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 451us        | 1.38646           | 0.693157                          |             |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1       | 1.83s        | 1.38566           | 0.691639                          | 0.000190735 |
PROGRESS: | 2       | 3.45s        | 1.38585           | 0.690663                          | 0.000190735 |
PROGRESS: | 3       | 5.01s        | 1.38662           | 0.690082                          | 0.000190735 |
PROGRESS: | 4       | 6.52s        | 1.38766           | 0.689626                          | 0.000190735 |
PROGRESS: | 5       | 8.13s        | 1.38905           | 0.689328                          | 0.000190735 |
PROGRESS: | 6       | 9.63s        | 1.39072           | 0.689181                          | 0.000190735 |
PROGRESS: | 7       | 11.13s       | 1.39268           | 0.689199                          | 0.000190735 |
PROGRESS: | 8       | 13.41s       | 1.39486           | 0.689349                          | 0.000190735 |
PROGRESS: | 9       | 15.68s       | 1.39705           | 0.689573                          | 0.000190735 |
PROGRESS: | 10      | 17.93s       | 1.39941           | 0.689905                          | 0.000190735 |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached.
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS:        Final objective value: 1.401
PROGRESS:        Final training Predictive Error: 0.690105



In [12]:

    
m3 = gl.ranking_factorization_recommender.create(train, 
                                                 item_data=items[['item_id', 'year', 'genres']], 
                                                 max_iterations=10)









    



PROGRESS: Recsys training: model = ranking_factorization_recommender
PROGRESS: Preparing data set.
PROGRESS:     Data has 555786 observations with 6038 users and 3526 items.
PROGRESS:     Data prepared in: 0.648528s
PROGRESS: Training ranking_factorization_recommender for recommendations.
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | Parameter                      | Description                                      | Value    |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS: | num_factors                    | Factor Dimension                                 | 32       |
PROGRESS: | regularization                 | L2 Regularization on Factors                     | 1e-09    |
PROGRESS: | solver                         | Solver used for training                         | adagrad  |
PROGRESS: | linear_regularization          | L2 Regularization on Linear Coefficients         | 1e-09    |
PROGRESS: | binary_target                  | Assume Binary Targets                            | True     |
PROGRESS: | side_data_factorization        | Assign Factors for Side Data                     | True     |
PROGRESS: | max_iterations                 | Maximum Number of Iterations                     | 10       |
PROGRESS: +--------------------------------+--------------------------------------------------+----------+
PROGRESS:   Optimizing model using SGD; tuning step size.
PROGRESS:   Using 69473 / 555786 points for tuning the step size.
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Attempt | Initial Step Size | Estimated Objective Value                |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | 0       | 10                | Not Viable                               |
PROGRESS: | 1       | 2.5               | Not Viable                               |
PROGRESS: | 2       | 0.625             | Not Viable                               |
PROGRESS: | 3       | 0.15625           | Not Viable                               |
PROGRESS: | 4       | 0.0390625         | No Decrease (5.02175 >= 1.38651)         |
PROGRESS: | 5       | 0.00976562        | 1.33702                                  |
PROGRESS: | 6       | 0.00488281        | No Decrease (1.49971 >= 1.38651)         |
PROGRESS: | 7       | 0.0012207         | No Decrease (1.41472 >= 1.38651)         |
PROGRESS: | 8       | 0.000305176       | No Decrease (1.38711 >= 1.38651)         |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: | Final   | 0.00976562        | 1.33702                                  |
PROGRESS: +---------+-------------------+------------------------------------------+
PROGRESS: Starting Optimization.
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Iter.   | Elapsed Time | Approx. Objective | Approx. Training Predictive Error | Step Size   |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | Initial | 420us        | 1.3865            | 0.693066                          |             |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: | 1       | 2.39s        | 1.67735           | 0.863704                          | 0.00976562  |
PROGRESS: | 2       | 4.58s        | 2.26991           | 1.39687                           | 0.00976562  |
PROGRESS: | 3       | 6.85s        | 2.65462           | 1.71702                           | 0.00976562  |
PROGRESS: | 4       | 11.47s       | DIVERGED          | DIVERGED                          | 0.00976562  |
PROGRESS: | RESET   | 13.13s       | 1.38655           | 0.693141                          |             |
PROGRESS: | 1       | 15.65s       | 1.45915           | 0.716086                          | 0.00488281  |
PROGRESS: | 2       | 18.06s       | 1.5529            | 0.843311                          | 0.00488281  |
PROGRESS: | 3       | 20.77s       | 1.66115           | 0.963798                          | 0.00488281  |
PROGRESS: | 4       | 22.90s       | 1.72651           | 1.03845                           | 0.00488281  |
PROGRESS: | 5       | 26.54s       | DIVERGED          | DIVERGED                          | 0.00488281  |
PROGRESS: | RESET   | 27.45s       | 1.3865            | 0.693157                          |             |
PROGRESS: | 1       | 29.61s       | 1.48426           | 0.694865                          | 0.00244141  |
PROGRESS: | 2       | 32.15s       | 1.54694           | 0.769582                          | 0.00244141  |
PROGRESS: | 3       | 34.43s       | 1.55958           | 0.79954                           | 0.00244141  |
PROGRESS: | 4       | 37.21s       | 1.58258           | 0.830171                          | 0.00244141  |
PROGRESS: | 5       | 41.71s       | DIVERGED          | DIVERGED                          | 0.00244141  |
PROGRESS: | RESET   | 42.69s       | 1.38653           | 0.693114                          |             |
PROGRESS: | 1       | 45.29s       | 1.40832           | 0.674002                          | 0.0012207   |
PROGRESS: +---------+--------------+-------------------+-----------------------------------+-------------+
PROGRESS: Optimization Complete: Maximum number of passes through the data reached (hard limit).
PROGRESS: Computing final objective value and training Predictive Error.
PROGRESS:        Final objective value: 1.43323
PROGRESS:        Final training Predictive Error: 0.685329

Train a recommender that leverages the similarity between items

Create a nearest neighbor model that uses the genres in common and the year of the movie.



In [14]:

    
dist = [[['genres'], 'jaccard', 1.0], 
        [['year'], 'euclidean', 1.0]]
nn_model = gl.nearest_neighbors.create(items, 'item_id', composite_params=dist)









    



Defaulting to brute force instead of ball tree because there are multiple distance components.
PROGRESS: Starting brute force nearest neighbors model training.

Compute a nearest neighbor graph.



In [15]:

    
similar = nn_model.query(items, 'item_id', k=100)\
             .rename({'query_label': 'item_id', 'reference_label': 'similar', 'distance': 'score'})\
             .join(items[['item_id', 'title']], on='item_id')\
             .join(items[['item_id', 'title']], on={'similar': 'item_id'})
similar['score'] = 1 - similar['score']
similar.print_rows(100, max_row_width=200)









    



PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 1            | 3526    | 0.0283607   | 33.558ms     |
PROGRESS: | 192          | 676992  | 5.44526     | 1.03s        |
PROGRESS: | 380          | 1339880 | 10.7771     | 2.04s        |
PROGRESS: | 569          | 2006294 | 16.1373     | 3.04s        |
PROGRESS: | 776          | 2736176 | 22.0079     | 4.03s        |
PROGRESS: | 990          | 3490740 | 28.0771     | 5.04s        |
PROGRESS: | 1190         | 4195940 | 33.7493     | 6.04s        |
PROGRESS: | 1387         | 4890562 | 39.3364     | 7.04s        |
PROGRESS: | 1572         | 5542872 | 44.5831     | 8.05s        |
PROGRESS: | 1761         | 6209286 | 49.9433     | 9.04s        |
PROGRESS: | 1967         | 6935642 | 55.7856     | 10.05s       |
PROGRESS: | 2163         | 7626738 | 61.3443     | 11.05s       |
PROGRESS: | 2361         | 8324886 | 66.9597     | 12.05s       |
PROGRESS: | 2563         | 9037138 | 72.6886     | 13.05s       |
PROGRESS: | 2744         | 9675344 | 77.8219     | 14.05s       |
PROGRESS: | 2883         | 1e+07   | 81.764      | 15.05s       |
PROGRESS: | 3044         | 1.1e+07 | 86.3301     | 16.07s       |
PROGRESS: | 3246         | 1.1e+07 | 92.059      | 17.05s       |
PROGRESS: | 3436         | 1.2e+07 | 97.4475     | 18.06s       |
PROGRESS: | Done         |         | 100         | 18.58s       |
PROGRESS: +--------------+---------+-------------+--------------+
+---------+---------+----------------+------+-----------+---------------------+
| item_id | similar |     score      | rank |   title   |       title.1       |
+---------+---------+----------------+------+-----------+---------------------+
|    1    |    1    |      1.0       |  1   | Toy Story |      Toy Story      |
|    1    |   3114  |      -5.0      |  2   | Toy Story |     Toy Story 2     |
|    1    |    34   |      -8.5      |  3   | Toy Story |         Babe        |
|    1    |   110   |      -9.0      |  4   | Toy Story |      Braveheart     |
|    1    |   608   |      -9.0      |  5   | Toy Story |        Fargo        |
|    1    |   356   |     -10.8      |  6   | Toy Story |     Forrest Gump    |
|    1    |    32   |     -11.0      |  7   | Toy Story |    Twelve Monkeys   |
|    1    |   296   |     -11.0      |  8   | Toy Story |     Pulp Fiction    |
|    1    |   2599  | -11.6666666667 |  9   | Toy Story |       Election      |
|    1    |   1265  |     -11.75     |  10  | Toy Story |    Groundhog Day    |
|    1    |   3578  |     -12.0      |  11  | Toy Story |      Gladiator      |
|    1    |   1580  | -12.8333333333 |  12  | Toy Story |     Men in Black    |
|    1    |   333   | -13.6666666667 |  13  | Toy Story |      Tommy Boy      |
|    1    |    21   |     -13.8      |  14  | Toy Story |      Get Shorty     |
|    1    |   3175  |     -13.8      |  15  | Toy Story |     Galaxy Quest    |
|    1    |   151   |     -14.0      |  16  | Toy Story |       Rob Roy       |
|    1    |   480   |     -14.0      |  17  | Toy Story |    Jurassic Park    |
|    1    |   1036  |     -14.0      |  18  | Toy Story |       Die Hard      |
|    1    |   1213  |     -14.0      |  19  | Toy Story |      GoodFellas     |
|    1    |   2355  |     -14.0      |  20  | Toy Story |    Bug's Life, A    |
|    1    |   2916  |     -14.0      |  21  | Toy Story |     Total Recall    |
|    1    |   2959  |     -14.0      |  22  | Toy Story |      Fight Club     |
|    1    |   457   |     -15.0      |  23  | Toy Story |    Fugitive, The    |
|    1    |   800   |     -15.0      |  24  | Toy Story |      Lone Star      |
|    1    |   1872  |     -15.0      |  25  | Toy Story |        Go Now       |
|    1    |   2571  |     -15.0      |  26  | Toy Story |     Matrix, The     |
|    1    |    13   | -15.3333333333 |  27  | Toy Story |        Balto        |
|    1    |   174   | -15.6666666667 |  28  | Toy Story |      Jury Duty      |
|    1    |   743   | -15.6666666667 |  29  | Toy Story |       Spy Hard      |
|    1    |    45   |     -15.75     |  30  | Toy Story |      To Die For     |
|    1    |   2797  |     -15.75     |  31  | Toy Story |         Big         |
|    1    |   327   | -15.8333333333 |  32  | Toy Story |      Tank Girl      |
|    1    |    18   |     -16.0      |  33  | Toy Story |      Four Rooms     |
|    1    |    22   |     -16.0      |  34  | Toy Story |       Copycat       |
|    1    |    24   |     -16.0      |  35  | Toy Story |        Powder       |
|    1    |    50   |     -16.0      |  36  | Toy Story | Usual Suspects, The |
|    1    |    67   |     -16.0      |  37  | Toy Story |       Two Bits      |
|    1    |   145   |     -16.0      |  38  | Toy Story |       Bad Boys      |
|    1    |   160   |     -16.0      |  39  | Toy Story |        Congo        |
|    1    |   179   |     -16.0      |  40  | Toy Story |       Mad Love      |
|    1    |   194   |     -16.0      |  41  | Toy Story |        Smoke        |
|    1    |   279   |     -16.0      |  42  | Toy Story |      My Family      |
|    1    |   331   |     -16.0      |  43  | Toy Story |      Tom & Viv      |
|    1    |   388   |     -16.0      |  44  | Toy Story |      Boys Life      |
|    1    |   553   |     -16.0      |  45  | Toy Story |      Tombstone      |
|    1    |   736   |     -16.0      |  46  | Toy Story |       Twister       |
|    1    |   850   |     -16.0      |  47  | Toy Story |        Cyclo        |
|    1    |   1138  |     -16.0      |  48  | Toy Story |       Dadetown      |
|    1    |   1704  |     -16.0      |  49  | Toy Story |  Good Will Hunting  |
|    1    |    69   | -16.6666666667 |  50  | Toy Story |        Friday       |
|    1    |   171   | -16.6666666667 |  51  | Toy Story |       Jeffrey       |
|    1    |   187   | -16.6666666667 |  52  | Toy Story |      Party Girl     |
|    1    |   102   | -16.6666666667 |  53  | Toy Story |      Mr. Wrong      |
|    1    |   411   | -16.6666666667 |  54  | Toy Story |     You So Crazy    |
|    1    |   505   | -16.6666666667 |  55  | Toy Story |        North        |
|    1    |   1414  | -16.6666666667 |  56  | Toy Story |        Mother       |
|    1    |   158   |     -16.75     |  57  | Toy Story |        Casper       |
|    1    |   235   |     -16.75     |  58  | Toy Story |       Ed Wood       |
|    1    |   256   |     -16.75     |  59  | Toy Story |        Junior       |
|    1    |   289   |     -16.75     |  60  | Toy Story |       Only You      |
|    1    |   304   |     -16.75     |  61  | Toy Story |      Roommates      |
|    1    |   550   |     -16.75     |  62  | Toy Story |      Threesome      |
|    1    |   852   |     -16.75     |  63  | Toy Story |       Tin Cup       |
|    1    |   1784  |     -16.75     |  64  | Toy Story |  As Good As It Gets |
|    1    |   2108  |     -16.75     |  65  | Toy Story |      L.A. Story     |
|    1    |   2858  |     -16.75     |  66  | Toy Story |   American Beauty   |
|    1    |    6    |     -17.0      |  67  | Toy Story |         Heat        |
|    1    |    10   |     -17.0      |  68  | Toy Story |      GoldenEye      |
|    1    |    14   |     -17.0      |  69  | Toy Story |        Nixon        |
|    1    |    16   |     -17.0      |  70  | Toy Story |        Casino       |
|    1    |    20   |     -17.0      |  71  | Toy Story |     Money Train     |
|    1    |    26   |     -17.0      |  72  | Toy Story |       Othello       |
|    1    |    76   |     -17.0      |  73  | Toy Story |      Screamers      |
|    1    |    77   |     -17.0      |  74  | Toy Story |      Nico Icon      |
|    1    |   159   |     -17.0      |  75  | Toy Story |       Clockers      |
|    1    |   170   |     -17.0      |  76  | Toy Story |       Hackers       |
|    1    |   190   |     -17.0      |  77  | Toy Story |         Safe        |
|    1    |   208   |     -17.0      |  78  | Toy Story |      Waterworld     |
|    1    |   227   |     -17.0      |  79  | Toy Story |      Drop Zone      |
|    1    |   240   |     -17.0      |  80  | Toy Story |       Hideaway      |
|    1    |   297   |     -17.0      |  81  | Toy Story |       Panther       |
|    1    |   300   |     -17.0      |  82  | Toy Story |      Quiz Show      |
|    1    |   379   |     -17.0      |  83  | Toy Story |       Timecop       |
|    1    |   425   |     -17.0      |  84  | Toy Story |       Blue Sky      |
|    1    |   461   |     -17.0      |  85  | Toy Story |       Go Fish       |
|    1    |   527   |     -17.0      |  86  | Toy Story |   Schindler's List  |
|    1    |   692   |     -17.0      |  87  | Toy Story |         Solo        |
|    1    |   695   |     -17.0      |  88  | Toy Story |      True Crime     |
|    1    |   742   |     -17.0      |  89  | Toy Story |       Thinner       |
|    1    |   764   |     -17.0      |  90  | Toy Story |        Heavy        |
|    1    |   835   |     -17.0      |  91  | Toy Story |       Foxfire       |
|    1    |   846   |     -17.0      |  92  | Toy Story |        Flirt        |
|    1    |   1168  |     -17.0      |  93  | Toy Story |       Bad Moon      |
|    1    |   1545  |     -17.0      |  94  | Toy Story |       Ponette       |
|    1    |   1552  |     -17.0      |  95  | Toy Story |       Con Air       |
|    1    |   1617  |     -17.0      |  96  | Toy Story |  L.A. Confidential  |
|    1    |   1842  |     -17.0      |  97  | Toy Story |       Illtown       |
|    1    |   1151  |     -17.5      |  98  | Toy Story |        Faust        |
|    1    |    48   |     -17.6      |  99  | Toy Story |      Pocahontas     |
|    1    |    65   | -17.6666666667 | 100  | Toy Story |       Bio-Dome      |
+---------+---------+----------------+------+-----------+---------------------+
[352600 rows x 6 columns]

Use this similarity data as the basis for a recommender.



In [16]:

    
m5 = gl.item_similarity_recommender.create(train, nearest_items=similar)









    



PROGRESS: Recsys training: model = item_similarity
PROGRESS: Warning: Column 'score' ignored.
PROGRESS:     To use this column as the target, set target = "score" and use a method that allows the use of a target.
PROGRESS: Preparing data set.
PROGRESS:     Loading user-provided nearest items.
PROGRESS:     Data has 555786 observations with 6038 users and 3526 items.
PROGRESS:     Data prepared in: 1.2663s

Evaluation

Create a precision/recall plot to compare the recommendation quality of the above models given our heldout data.



In [19]:

    
model_comparison = gl.compare(valid, [m0, m1, m2, m3, m5], user_sample=.3)









    



compare_models: using 298 users to estimate model performance
PROGRESS: Evaluate model M0

Precision and recall summary statistics by cutoff
+--------+----------------+-----------------+
| cutoff | mean_precision |   mean_recall   |
+--------+----------------+-----------------+
|   1    | 0.281879194631 | 0.0263711769818 |
|   2    | 0.263422818792 | 0.0461998105304 |
|   3    | 0.244966442953 |  0.062829557057 |
|   4    | 0.235738255034 | 0.0777983759123 |
|   5    | 0.225503355705 | 0.0882340267173 |
|   6    | 0.209731543624 | 0.0951448587842 |
|   7    | 0.200383509108 |  0.104329520094 |
|   8    | 0.192953020134 |  0.112424305266 |
|   9    | 0.186428038777 |  0.120099746396 |
|   10   | 0.180536912752 |  0.128577958505 |
+--------+----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M1

Precision and recall summary statistics by cutoff
+--------+----------------+------------------+
| cutoff | mean_precision |   mean_recall    |
+--------+----------------+------------------+
|   1    | 0.134228187919 | 0.00957333909485 |
|   2    | 0.151006711409 |  0.025808351569  |
|   3    | 0.156599552573 | 0.0351607762475  |
|   4    | 0.146812080537 | 0.0403875964733  |
|   5    | 0.142281879195 | 0.0468043145225  |
|   6    | 0.137024608501 | 0.0541186463206  |
|   7    | 0.129434324065 | 0.0599776056468  |
|   8    | 0.124580536913 | 0.0657980852227  |
|   9    | 0.119686800895 | 0.0690609490538  |
|   10   | 0.118456375839 | 0.0815028809531  |
+--------+----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M2

Precision and recall summary statistics by cutoff
+--------+-----------------+-----------------+
| cutoff |  mean_precision |   mean_recall   |
+--------+-----------------+-----------------+
|   1    |  0.167785234899 | 0.0161053142356 |
|   2    |  0.144295302013 | 0.0254801793572 |
|   3    |  0.131991051454 | 0.0318025677732 |
|   4    |      0.125      | 0.0371829043559 |
|   5    |  0.11744966443  | 0.0410638816769 |
|   6    |  0.110738255034 | 0.0468292410286 |
|   7    |  0.105465004794 | 0.0524601958501 |
|   8    |  0.101510067114 | 0.0573956656301 |
|   9    | 0.0988068605518 | 0.0602808953692 |
|   10   |  0.098322147651 | 0.0649756343573 |
+--------+-----------------+-----------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M3

Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    |  0.11744966443  | 0.00704077148453 |
|   2    |  0.114093959732 | 0.0177310726317  |
|   3    |  0.106263982103 | 0.0223298124415  |
|   4    |  0.105704697987 | 0.0331716430118  |
|   5    |  0.109395973154 | 0.0435932795409  |
|   6    |  0.104586129754 |  0.047143384739  |
|   7    |  0.103547459252 | 0.0546491993938  |
|   8    |  0.100251677852 | 0.0607113639254  |
|   9    | 0.0995525727069 |  0.06877810981   |
|   10   | 0.0946308724832 | 0.0719324763374  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

PROGRESS: Evaluate model M4

Precision and recall summary statistics by cutoff
+--------+-----------------+------------------+
| cutoff |  mean_precision |   mean_recall    |
+--------+-----------------+------------------+
|   1    | 0.0369127516779 | 0.00380229838808 |
|   2    | 0.0436241610738 | 0.00896093206526 |
|   3    | 0.0425055928412 | 0.0113163832365  |
|   4    | 0.0394295302013 | 0.0123027684318  |
|   5    | 0.0395973154362 | 0.0136800043452  |
|   6    | 0.0402684563758 | 0.0167122518012  |
|   7    | 0.0407478427613 | 0.0196711571782  |
|   8    | 0.0402684563758 |  0.020772309913  |
|   9    |  0.037658463833 | 0.0211772817451  |
|   10   | 0.0355704697987 |  0.021486414479  |
+--------+-----------------+------------------+
[10 rows x 3 columns]

Model compare metric: precision_recall



In [20]:

    
gl.show_comparison(model_comparison, [m0, m1, m2, m3, m5])



In [ ]:

item_id	genres	title	year	num_users
1	{"Children's": 1, 'Comedy': 1, 'Animati ...	Toy Story	1995	(1000.000, Inf]
2	{"Children's": 1, 'Adventure': 1, ...	Jumanji	1995	(100.000, 1000.000]
3	{'Romance': 1, 'Comedy': 1} ...	Grumpier Old Men	1995	(100.000, 1000.000]
4	{'Drama': 1, 'Comedy': 1}	Waiting to Exhale	1995	(10.000, 100.000]
5	{'Comedy': 1}	Father of the Bride Part II ...	1995	(10.000, 100.000]
6	{'Action': 1, 'Thriller': 1, 'Crime': 1} ...	Heat	1995	(100.000, 1000.000]
7	{'Romance': 1, 'Comedy': 1} ...	Sabrina	1995	(100.000, 1000.000]
8	{"Children's": 1, 'Adventure': 1} ...	Tom and Huck	1995	(10.000, 100.000]
9	{'Action': 1}	Sudden Death	1995	(10.000, 100.000]
10	{'Action': 1, 'Adventure': 1, ...	GoldenEye	1995	(100.000, 1000.000]